Neighborhood Clustering of Web Users With Rough K-Means

نویسندگان

  • RITU SONI
  • RAJEEV NANDA
چکیده

Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets topology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Reasonable Rough Approximation for Clustering Web Users

Due to the uncertainty in accessing web pages, analysis of web logs faces some challenges. Several rough k-means cluster algorithms have been proposed and successfully applied to web usage mining. However, they did not explain why rough approximations of these cluster algorithms were introduced. This paper analyzes the characteristics of the data in the boundary areas of clusters, and then a ro...

متن کامل

Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets

Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the pos...

متن کامل

An Efficient Approach for Clustering Web Access Patterns from Web Logs

The interests of web users can be revealed by their visited web pages and time duration on these web pages during their surfing. Time duration on a web page is characterized as a fuzzy linguistic variable because linguistic variable makes users easily understand the expression of time duration and can disregard subtle difference between two time durations. Each web access pattern from web logs ...

متن کامل

Analysis of Click Stream Patterns using Soft Biclustering Approaches

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, appl...

متن کامل

Cluster Analysis Using Rough Clustering and k-Means Clustering

IntroductIon Cluster analysis is a fundamental data reduction technique used in the physical and social sciences. It is of potential interest to managers in Information Science, as it can be used to identify user needs though segmenting users such as Web site visitors. In addition, the theory of Rough sets is the subject of intense interest in computational intelligence research. The extension ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007